Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis

نویسندگان

Sandrine Brognaux

Thomas François

Marco Saerens

چکیده

Text-to-speech has long been centered on the production of an intelligible message of good quality. More recently, interest has shifted to the generation of more natural and expressive speech. A major issue of existing approaches is that they usually rely on a manual annotation in expressive styles, which tends to be rather subjective. A typical related issue is that the annotation is strongly influenced – and possibly biased – by the semantic content of the text (e.g. a shot or a fault may incite the annotator to tag that sequence as expressing a high degree of excitation, independently of its acoustic realization). This paper investigates the assumption that human annotation of basketball commentaries in excitation levels can be automatically improved on the basis of acoustic features. It presents two techniques for label correction exploiting a Gaussian mixture and a proportional-odds logistic regression. The automatically re-annotated corpus is then used to train HMM-based expressive speech synthesizers, the performance of which is assessed through subjective evaluations. The results indicate that the automatic correction of the annotation with Gaussian mixture helps to synthesize more contrasted excitation levels, while preserving naturalness.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation

In the area of large French speech corpora, there is a demonstrated need for a common prosodic notation system allowing for easy data exchange, comparison, and automatic annotation. The major questions are: (1) how to develop a single simple scheme of prosodic transcription which could form the basis of guidelines for non-expert manual annotation (NEMA), used for linguistic teaching and researc...

متن کامل

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis

In conventional concept-to-speech (CTS) methods, a common step is predicting abstract prosodic descriptions, such as the locations of accents and phrase boundaries, from the linguistic information provided by the text generation module. But the prediction results always contain errors, and unacceptable prosodic prediction may ruin the synthesized speech. In addition, linguistic information, whi...

متن کامل

Text and Speech Corpora for Text-To-Speech Synthesis of Tales

Text and speech corpora for training a tale telling robot have been designed, recorded and annotated. The aim of these corpora is to study expressive storytelling behaviour, and to help in designing expressive prosodic and co-verbal variations for the artificial storyteller). A set of 89 children tales in French serves as a basis for this work. The tales annotation principles and scheme are des...

متن کامل

Creation and analysis of a Polish speech database for use in unit selection synthesis

The main aim of this study is to describe the process of creating a speech database to be used in corpus based text-to-speech synthesis. To help achieve natural sounding speech synthesis, the database construction was aimed at rich phonetic and prosodic coverage based on variable length units (phoneme, diphone, triphone) from different phonetic and prosodic contexts. Following previous work on ...

متن کامل

A Bootstrapping Approach to Automating Prosodic Annotation for Limited-domain Synthesis

Most speech synthesis systems use symbolic prosody labels for marking emphasis and phrase structure, but in corpus-based approaches prosodic annotation of speech is a labor intensive process driving up the cost of development of new voices. This paper explores the potential for reducing that cost by using a bootstrapping approach to automatic prosodic annotation, particularly in a limited domai...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Combining Manual and Automatic Prosodic Annotation for Expressive Speech Synthesis

نویسندگان

چکیده

منابع مشابه

Design and Evaluation of Shared Prosodic Annotation for Spontaneous French Speech: From Expert Knowledge to Non-Expert Annotation

Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis

Text and Speech Corpora for Text-To-Speech Synthesis of Tales

Creation and analysis of a Polish speech database for use in unit selection synthesis

A Bootstrapping Approach to Automating Prosodic Annotation for Limited-domain Synthesis

عنوان ژورنال:

اشتراک گذاری